In the last module, we:
tidyverse In this module, we will create plots using the highcharter package
| Objective | Complete |
|---|---|
| Introduce using the highcharter package to build interactive visualizations | |
| Use highcharter with tidy data to create a scatterplot | |
| Visualize a correlation plot with hchart | |
| Build a column plot with hchart | |
| Create a boxplot with hchart | |
| Save interactive plots with htmlwidgets |
variablesmain_dir be the variable corresponding to your skillsoft folder# Set `main_dir` to the location of your `skillsoft` folder (for Mac/Linux).
main_dir = "~/Desktop/skillsoft"
# Set `main_dir` to the location of your `skillsoft` folder (for Windows).
main_dir = "C:/Users/[username]/Desktop/skillsoft"
# Make `data_dir` from the `main_dir` and remainder of the path to data directory.
data_dir = paste0(main_dir, "/data")
# Make `plots_dir` from the `main_dir` and remainder of the path to plots directory.
plot_dir = paste0(main_dir, "/plots")
# Set directory to data_dir.
setwd(data_dir)
ggplot2 and other plotting libraries (including base R itself), highcharter allows us to build complex, customized, and meaningful visualizations with the help of Highcharter is an R wrapper that allows R users to tap into one of the most comprehensive data visualization JavaScript-based libraries: HighchartsHighcharts is free for individual research and non-profit purposes, but there are cases and restrictions to its use and you may need to obtain a license if you decide to integrate it into software or company-wide productsLet's install the package and check its documentation
# Install `highcharter` package.
install.packages("highcharter")
# Load the library.
library(highcharter)
# View documentation.
library(help = "highcharter")
?highchart
ggplot2, in order to create a plot, we need to call the main plotting function ggplot()highcharter it is highchart()?hchart
hchart(Some_data, #<- dataset to use
"plot_type", #<- plot type to use
hcaes(x = variable1, #<- x-axis mapping
y = variable2, #<- y-axis mapping
group = variable3, #<- group by
...))
hchart is a shorthand version of the highchart function, which takes a few key arguments to plot:
scatter, bar, column, line, etc.)hcaes (i.e., highcharts aesthetics) mapping of variables (works exactly the same way as with ggplot2!)ggplot2, the highcharter library has its own vocabularyhighcharter is called a type. Here are some widely used ones:| Highcharter series type | Plot type |
|---|---|
| scatter | scatterplot |
| line | line graph |
| boxplot | boxplot |
| column | bar plot |
| bar | horizontal bar plot |
| histogram | histogram |
| area | density |
| Objective | Complete |
|---|---|
| Introduce using the highcharter package to build interactive visualizations | ✔ |
| Use highcharter with tidy data to create a scatterplot | |
| Visualize a correlation plot with hchart | |
| Build a column plot with hchart | |
| Create a boxplot with hchart | |
| Save interactive plots with htmlwidgets |
CMP_subset dataset from our data_dir into R's environment# Set working directory to where we store data.
setwd(data_dir)
library(tidyverse)
# Read CSV file
CMP_subset = read.csv("CMP_subset.csv",
header = TRUE)
Now, tidy the data as before and transform it from wide to long for easy visualization
# Prep data for univariate plots
CMP_subset_long = CMP_subset %>%
gather(key = "variable",
value = "value")
# Make names of processes and materials more user friendly and readable.
CMP_subset_long = CMP_subset_long %>%
mutate(variable =
str_replace(variable,
"Biological", "Bio ")) %>%
mutate(variable =
str_replace(variable,
"Manufacturing", "Man. ")) %>%
mutate(variable =
str_replace(variable,
"0", " ")) %>%
group_by(variable) %>% #<- normalize
mutate(norm_value =
value/max(value, na.rm = TRUE))
# Prep data for scatterplot
CMP_subset_long2 = CMP_subset %>%
gather(BiologicalMaterial01:ManufacturingProcess03,
key = "variable",
value = "value") %>%
# All other transformations we've done before.
mutate(variable = str_replace(variable, "Biological", "Bio ")) %>%
mutate(variable = str_replace(variable, "Manufacturing", "Man. ")) %>%
mutate(variable = str_replace(variable, "0", " ")) %>%
group_by(variable) %>%
mutate(norm_value = value/max(value, na.rm = TRUE))
head(CMP_subset_long2,3)
# A tibble: 3 x 4
# Groups: variable [1]
Yield variable value norm_value
<dbl> <chr> <dbl> <dbl>
1 38 Bio Material 1 6.25 0.709
2 42.4 Bio Material 1 8.01 0.909
3 42.0 Bio Material 1 8.01 0.909
To construct a scatterplot with highcharter, we use the hchart() function and pass the data, plot type ( scatter), and aesthetics to it as arguments
# Construct an interactive scatterplot.
scatter_interactive = #<- name the plot
hchart(CMP_subset_long2, #<- set data
"scatter", #<- plot type "scatter"
hcaes(x = norm_value, #<- set aesthetics to map x-axis
y = Yield, #<- set aesthetics to map y-axis
group = variable)) #<- group by
Viewer pane, right next to the Help tabscatter_interactive
highcharts every plotted category seen in the legend is called a series highcharts every new option or layer can be added using the already familiar pipe operator (as opposed to the + operator in ggplot2)hc_chart function is responsible for global chart options like zoom, size, and themezoomType argument to hc_chartxy zoom allows to zoom across both x and y axes# Pipe chart options to original chart.
scatter_interactive = scatter_interactive %>%
# Use chart options to specify zoom.
hc_chart(zoomType = "xy")
scatter_interactive
A title can be added to highcharter plots using the hc_title() function
# Pipe chart options to original chart.
scatter_interactive = scatter_interactive %>%
# Add title to the plot.
hc_title(text = "CMP data: Yield vs. other variables")
scatter_interactive
| Objective | Complete |
|---|---|
| Introduce using the highcharter package to build interactive visualizations | ✔ |
| Use highcharter with tidy data to create a scatterplot | ✔ |
| Visualize a correlation plot with hchart | |
| Build a column plot with hchart | |
| Create a boxplot with hchart | |
| Save interactive plots with htmlwidgets |
hchart recognizes the type of data being given to it# Compute a correlation matrix for the first
# 4 variables in our data.
cor_matrix = cor(CMP_subset[, 1:4])
# Construct a correlation plot by
# simply giving the plotting function
# a correlation matrix.
correlation_interactive = hchart(cor_matrix) %>%
# Add title to the plot.
hc_title(text = "CMP data: correlation")
correlation_interactive
| Objective | Complete |
|---|---|
| Introduce using the highcharter package to build interactive visualizations | ✔ |
| Use highcharter with tidy data to create a scatterplot | ✔ |
| Visualize a correlation plot with hchart | ✔ |
| Build a column plot with hchart | |
| Create a boxplot with hchart | |
| Save interactive plots with htmlwidgets |
Let's now create an interactive plot to visualize the summary of our data
# Create data summary.
CMP_summary = summary(CMP_subset)
# Save it as a data frame.
CMP_summary = as.data.frame(CMP_summary)
# Inspect the data.
head(CMP_summary)
Var1 Var2 Freq
1 Yield Min. :35.25
2 Yield 1st Qu.:38.75
3 Yield Median :39.97
4 Yield Mean :40.18
5 Yield 3rd Qu.:41.48
6 Yield Max. :46.34
# Remove an empty variable.
CMP_summary$Var1 = NULL
# Rename remaining columns.
colnames(CMP_summary) = c("Variable",
"Summary")
# Inspect updated data.
head(CMP_summary)
Variable Summary
1 Yield Min. :35.25
2 Yield 1st Qu.:38.75
3 Yield Median :39.97
4 Yield Mean :40.18
5 Yield 3rd Qu.:41.48
6 Yield Max. :46.34
separate function in tidyr# Separate `Summary` column into 2 columns.
CMP_summary = CMP_summary %>% #<- set original data
separate(Summary, #<- separate `Summary` variable
into = c("Statistic", "Value"), #<- into 2 columns: `Statistic`, `Value`
sep = ":", #<- set separating character
convert = TRUE) #<- where applicable convert data (to numeric)
# Inspect the first few entries in data.
head(CMP_summary)
Variable Statistic Value
1 Yield Min. 35.25
2 Yield 1st Qu. 38.75
3 Yield Median 39.97
4 Yield Mean 40.18
5 Yield 3rd Qu. 41.48
6 Yield Max. 46.34
# Inspect total number of rows in data including NAs.
nrow(CMP_summary)
[1] 49
# Inspect `Value` column for `NAs`.
which(is.na(CMP_summary$Value) == TRUE)
[1] 7 14 21 28
# Subset only rows where `Value` is not NAs.
CMP_summary = subset(CMP_summary, !is.na(Value))
# Now the number of rows should be 4 less.
nrow(CMP_summary)
[1] 45
# Construct the summary chart.
CMP_summary_interactive =
hchart(CMP_summary, #<- set data
"column", #<- set type (`column` in highcharts)
hcaes(x = Statistic, #<- arrange `Statistics` across x-axis
y = Value, #<- map `Value` of each `Statistic` to y-axis
group = Variable)) #<- group columns by `Variable`
CMP_summary_interactive
statistics, it would be convenient for the tooltip to contain information about the group rather than the individual columns within the grouptooltip options of the chart using the hc_tooltip optionshared option is often used to share a tooltip between members of a group# Adjust tooltip options by piping `hc_tooltip` to base plot.
CMP_summary_interactive = CMP_summary_interactive %>%
hc_tooltip(shared = TRUE) %>% #<- `shared` needs to be set to `TRUE`
hc_title(text = "CMP data variable summary") #<- add title to your plot
CMP_summary_interactive
| Objective | Complete |
|---|---|
| Introduce using the highcharter package to build interactive visualizations | ✔ |
| Use highcharter with tidy data to create a scatterplot | ✔ |
| Visualize a correlation plot with hchart | ✔ |
| Build a column plot with hchart | ✔ |
| Create a boxplot with hchart | |
| Save interactive plots with htmlwidgets |
hcboxplot allows us to create an interactive boxplot. It needs two arguments:
x requires the numeric data to be plotted along the x-axis (boxplot in highcharts is horizontal by default)var requires categorical data to be plotted along the y-axis?hcboxplot
hcboxplot(x = Numeric_data_vector,
var = Categorical_data_vector,
...)
# Construct an interactive boxplot.
boxplot_interactive =
hcboxplot(x = CMP_subset_long$norm_value,
var = CMP_subset_long$variable,
name = "Normalized value") %>%
hc_title(text = "CMP data variables")
boxplot_interactive
series options for various plot types, we use the hc_plotOptions functionHighcharts API documentation https://api.highcharts.com/highcharts/plotOptionsNow we will use the hc_plotOptions() function to customize color for our boxplot
# Enhance original boxplot with color options.
boxplot_interactive = boxplot_interactive %>%
hc_plotOptions( #<- plot options
boxplot = list( #<- for boxplot
colorByPoint = TRUE)) #<- color each box
boxplot_interactive
| Objective | Complete |
|---|---|
| Introduce using the highcharter package to build interactive visualizations | ✔ |
| Use highcharter with tidy data to create a scatterplot | ✔ |
| Visualize a correlation plot with hchart | ✔ |
| Build a column plot with hchart | ✔ |
| Create a boxplot with hchart | ✔ |
| Save interactive plots with htmlwidgets |
htmlwidgets package lets us use JavaScript visualization libraries in R consolehtmlwidgets here# Install `htmlwidgets` package.
install.packages("htmlwidgets")
# Load the library.
library(htmlwidgets)
# View documentation.
library(help = "htmlwidgets")
# Set working directory to where you save plots.
setwd(plot_dir)
# Save desired interactive plot to an HTML file.
saveWidget(scatter_interactive, #<- plot object to save
"interactive_scatterplot.html", #<- name of file to where the plot is to be saved
selfcontained = TRUE) #<- set `selfcontained` to TRUE, so that
# all necessary files and scripts are embedded
# into the HTML file itself
| Objective | Complete |
|---|---|
| Introduce using the highcharter package to build interactive visualizations | ✔ |
| Use highcharter with tidy data to create a scatterplot | ✔ |
| Visualize a correlation plot with hchart | ✔ |
| Build a column plot with hchart | ✔ |
| Create a boxplot with hchart | ✔ |
| Save interactive plots with htmlwidgets | ✔ |
In this module, we:
highcharter package for creating interactive visualizationshchart() function of highcharterhighchart options and parametershtmlwidgets so that they can be embedded in R markdown and R shiny applicationsIn the next module, we will: